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a) 


ABSTRACT 

The goal of this study is to provide a single source of data that enables the 
selection of an appropriate voice recognition (VR) application for a decision 
support system (DSS) as well as for other computer applications. A brief 
background of both voice recognition systems and decision support systems is 
provided with special emphasis given to the dialog component of DSS. The 
categories of voice recognition discussed are human factors, environmental 
factors, situational factors, quantitative factors, training factors, host computer 
factors, and experiments and research. Each of these areas of voice recognition is 
individually analyzed, and specific references to applicable literature are included. 


This study also includes appendices that contain: 


¢ A glossary (including definitions) of phrases specific to both decision support 
system and voice recognition systems. 


¢ Keywords applicable to this study. 


¢ An annotated bibliography (alphabetically and by specific topics) of current 
VR systems literature containing over 200 references. 


¢ An index of publishers. 


¢ Acomplete listing of current commercially available VR systems. 


111 


he 


INTRODUCTION ....ccccccsccessecceccesvecceses see sts sotn lteter (stat ==——mn 1 
A. BACKGROUND ]....o. cc. iicccesceceescece oscaceessens sete ee DEERE Eee eee nna il 
B. VOICE RECOGNITION SYSTEMS. nr 2 
C. DECISION SUPPORT SYSTEMS.....2 3 
D. GOALS AND OBJECTIVES. oreo. cctee treet 6 
E. SCOPE AND METHODOLOGY oo. ccccccceeere esac eee ee 7 
L. SCOP)... cdoccic.e) nesweoss vets cis wo0ea000 soc ee R REN ee eee rn i 
2. Research Methodology. ....2...:..1..000.s.0s.csessensn eeieeen a teeene nea 7 
DATA ANALYSIS... .:.sccs.dsiicccesese chev oste sce eee nee eee eee: <a 8 
A. BACKGROUND oon. ic ccciocscacieesitces<ceee se se es decease ene setae 8 
B. HUMAN FACTOR Succi caicecccccaes ess eee 8 
1. Stress Related Factors ...i5...:1...00)-e enteee eee ee ee 9 
2. Multimodal Factors siecccoucc 350. eae eee ee 10 
3. Speakers Experience Levelve2) sere ee eee 1] 
4... Computer Expenence Levels eee sosaesalve ace ge eeeeeeee 12 
>. Vocabulary Factors... 738:.......saeemenes.s)e.ssaeocss sateen 2 
C. ENVIRONMENTAL FACTORS. 22... eee 14 
l. Mululingual Factors....222..;..dmic seen eee ee 15 
2. Multicultural Factors cece eee eee eee tS 
3. Command and Control Environmentsireei eae 16 
4. High Noise Envirommentsie eee er renee ee 16 
5. Low-Light Environmenmisiieeecs eee Ly 
D, SITUATIONAL FACTORS Fir 18 
1. Multiuser or Group Usape...-aege sc). 18 
2. Individual Usage.........:2egeeseseeeeeeere tee eee ee ZA 
3. Handicap Situations ...2.i22epessse-e eee eee 21 


TABLE OF CONTENTS 


1V 


Fen) NO NOT opel thee NON Settee. cnicaae sean seesnveicsasesssse0sccndenerseresnes we) 


AN eM CEN ccc ccicccsccecescssnresccecsscbeectacerscsecsssaces QZ 

PRE NCO AC Sera eee MRR EOEE Ree ona oe 8c ePRERCE ons oscesoccavsccaresteonecess ZS 

BO Peed Of Emtiny meme en essen eis OMe voce se ccse scones 24 
EAS UO TiO ce et ee oe ova edocs Lhe 0 vc detee da +++ ones 24 
SPTETICCH UCU Y IIe meME cs REM: «EE. «5-22 once rencnseso-e-secndeceeececcerses: 25 

FPN MINI Grp COIN Sretactenenseiitissccterstsscccscssccssecescvcscececseseecesecs ZS 

[ES DECAKETH DEDCHGENt Sy Stemi smmmenenenct eerste notin .....:. MNES. 25 

PRE Se ACTING een iG SUC IS ee eerte senor. n s2iudecaneeswskessaensopadiyeniss 26 

SRS OMMMNOUS SPECCH RCCOOMINON 22cc22--.--0----..0----n-rvecesenerscareserenss Ze 
GONG Che (ces IC COMM Ce OCMMMON seater ste sts sate cccececeseccceesesseeesoscresecs 28 

DNC COGNITION ACCUTACymmmemre:.... mente rrnne nen it ss llie ee 28 

Sipe isc®) Sole) vale OM ROBAC TORS  scsce.c sees oles sdesaseesevaseessesesesecesessausuee 29 

TMS BLOOM IG tee al ic. eee ee eae tet cma occ ocjecsineslenasaees saiueevasmes 30 

PRM UPA AINE Sie coaeteninacsmeeneetiescd sco .4sekteceascdcottcsehcwtelesscccee+scsesseeestec BD 
eNOS. OLS oie ccc shines acs aleve Ee sis Se Mees. woo SUN Bb] 

PDS SE SEO MU a NCC UITE Uemaees acess cerececeoteslseceesedeseleecssscccoses toads 32 

Fee Ore EOI lle) NLS) AUIN IO IN SAIN CO parent ss cikscdtebersasccescsscssecscscesscssees BZ 
eee ol See NID CON CILUSTIONS wo. cdcicccccccscscccsscscocccsscssseess 34 
Pe Oe) DON tee aos 92s dene ese ecee (eet cets secs eseGsrcsécesocecstsvesseseseesrceee 34 

Bee ONO EMO SONI Sremane si. ssc cuavegoe das eseeeeereseincechccensiadevdssescsscovecovsconss 35 

Mem IUVEEINIVA TIONG «0. 2cccccccccccccessccccccccccscsscvvcecccccnvceses 36 
PeeeNDIX A GLOSSARY OF TERMG..............cccccsccceccecsccssees 38 
Ee IC DICE Y VY ORR DDS ........scc0ceccssscccsecserccecvescescccccccscnens 40 


APPENDIX C ANNOTATED BIBLIOGRAPHY... 41 


APPENDIX Cl HUMAN FACTORS... crete sss eee 81 
APPENDIX C2 ENVIRONMENTAL FACTORS ow... eeeeeenne 83 
APPENDIX C3 SITUATIONAL FACTORS........20222e ce. . oe se eee 85 
APPENDIX C4 QUANTITATIVE FACTORS.....................) ae 87 
APPENDIX C5 TRAINING FACTORS. |. ooo eee eee 89 
APPENDIX C6 HOST COMPUTER FACTORS.......0o Doo.) e ee 92 
APPENDIX C7 EXPERIMENTS AND RESEARCH.............. oc 94 
APPENDIX D PUBLISHER INDEX oni eee eee ee 99 
APPENDIX E CURRENT VOICE RECOGNITION SYSTEMS..... 109 
LIST OF REFERENCES ...060oo.ccsscicesscsieccciecccleieeeteee seit aera 115 
INITIAL. DISTRIBUTION LIST......... 2. eee eee 117 


V1 


LIST OF FIGURES 


Figure 1.1. The Dialog, Data, Model Components of the DSS Framework....... a 


Figure 1.2. The Dialog System User Interface 


Figure 2.1. Typologies of Group Decision Support Systems......................000« 20 


Vill 


ACKNOWLEDGMENTS 


I wish to express my thanks to my thesis advisor, Professor Judy Lind, for 
introducing me to the world of voice recognition, thus allowing me to experience 
what I had only dreamed of after seeing science fiction movies. Her understanding 
and appreciation of my time constraints are also appreciated. 

My sincerest thanks to my wife Ann for her love, help, understanding, and that 


“I know you can do it" encouragement that I appreciate so much. 


V1ll 


I. INTRODUCTION 


A. BACKGROUND 

The rapid influx of powerful microcomputers has provided both the incentive 
and capability to enhance the productivity of humans. These powerful and 
inexpensive workhorses are being exploited for automating routine tasks, acquiring 
and communicating information, and the intelligent support of decision making. 
Of major importance is the effort to enhance the productivity of humans who 
control these machines through the use of human-computer interfaces that both 
maximize human performance and take advantage of the growing capabilities of 
these computer systems. 

It is estimated that, for over 95 percent of human-computer interactions, 
people costs are greater than the machine costs [Infotech 79]. Actions that reduce 
the human cost and simplify the human interface will have great impact on the 
computer industry. A technology must explore these interfaces in order to grow 
and develop to its full potential. 

Many forms of man-machine interfaces have been developed, including 
cathode ray tube displays, printers, keyboards, joysticks, etc. However, speech is 
recognized to the most natural and fastest form of human communication, and 
should be considered as an interface technique for system optimization. [LeFever 
87] 

Research into voice recognition (VR) systems has been ongoing for over 30 
years. Research into decision support systems (DSS), which evolved from 


management information systems over 15 years ago, now is maturing. The two 


technologies, which until now have matured separately, are logical candidates for 
merging. Thus the focus of this study is the application of voice recognition 
systems to decision support systems. A Glossary of Terms used in this study is 


provided in Appendix A. 


B. VOICE RECOGNITION SYSTEMS 

Voice recognition is defined as the ability of a computer or other device to 
recognize spoken words correctly and to translate them into a predetermined 
output string to the computer [LeFever 87]. Voice recognition is also called 
automatic speech recognition and by other names, as listed in Appendix B. It is 
important to note that the term voice recognition refers to and concerns only 
command input via the human voice. It does not include computerized voice output 
or speech synthesis. 

There are many advantages to using voice input to computer systems. In 


general, a voice recognition system: 
¢ 1s more accurate than conventional forms of input 
¢ allows for concurrent use of hands, eyes, and other senses 
¢ allows freedom of movement from a specified location 
¢ can be used in low light or dark areas 
¢ is faster than conventional forms of input 


¢ promotes the use of the computer system or application that it is used 
in conjunction with 


¢ is easy to learn and easy to use 
* promotes productivity 
¢ works better in multilingual environments than conventional input 


¢ works equally well for individuals ranging from novice typists 
through expert typists 


¢ works well for many handicapped individuals [Poock 80, Poock 81, 
Armstrong 80, Baker 84, LeFever 87] 


Dobney classifies voice recognition as “a fifth generation language or more 
concisely a fifth generation concept." [Dobney 87] Voice recognition, along with 
other fifth generation concepts, is expected to be critical for the future for all 


computer applications. 


C. DECISION SUPPORT SYSTEMS 
There is no generally recognized single definition of decision support systems. 
The definitions in use cover a broad spectrum of what is and is not a DSS [Keen 87]. 


For this study, the following definition will be used: 


The application of available and suitable computer-based technology to 
help improve the effectiveness of managed decision making in semi- 
structured tasks. [Keen 87] 


The key aspects of DSS include: 

e They are computer based systems. 

¢ They are used by decision makers. 

¢ They help decision makers confront ill-structured problems. 
e They work through direct interaction. 

e They utilize data analysis models. [Sprague 82] 


This study will focus on the fourth aspect, direct interaction between the decision 
maker and the computer system. 

The basic DSS has three components: data, dialog, and models [Sprague 82]. 
These are referred to as the DDM paradigm of a DSS and the relationships are 
illustrated in Figure 1.1. The importance of the dialog component cannot be over- 
emphasized, since all the capabilities of the DSS must be articulated and 


implemented through it. 


(Dialog-Data-Model) DDM Paradigm 
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Figure 1.1. The Dialog, Data, Model Components of the DSS 
Framework [Sprague 82] 


This dialog component consists of three subcomponents, as illustrated in Figure 


2s 


¢ The action language is what the user can do in communicating with 
the system. 


¢ The presentation or display language is what the user sees. 


¢ The knowledge base is what the user must know in order to operate 
the system. This can take the form of help menus, reference cards or 
instructions, a user's manual or information that previously has been 
learned. 


Action oe Presentation 
Language language 
(What you (What you see) 
can input) 


Knowledge Base 
(What you need to know) 





Figure 1.2. The Dialog System User Interface [Sprague 80] 


This study primarily considers the action language of DSSs and its 
implementation through the use of voice input. Secondary consideration 1s given to 
minimizing the size of the knowledge base through the use of a natural language 
interface and by optimizing the presentation language so that it will naturally 
encourage and prompt proper input. 

No single all-encompassing or overall best dialog mode presently exits. That 
1s, nO System has the ability to handle a variety of human interaction styles, shifting 
between styles at the user's request. Regardless of a user's experience with 
computers or the problem or tasks, the specific dialog mode of a given system must 
be learned and used, in order to use the system. This is true even if the user is 
already familiar with another dialog mode for another system. 

As noted by Sprague, "Dialog will profit significantly from the inclusion of 


natural language processing techniques and voice recognition.” [Sprague 87] 


D. GOALS AND OBJECTIVES 

The primary objective of this study is to provide a current, concise, condensed, 
and summarized single source of data that will enable selection of an appropriate 
voice recognition application for a given decision support system. In essence this is 
a non-automated aid for making voice recognition system decisions related to the 
design of an automated DSS. 

A secondary objective is to provide users, developers, researchers, and all 
others concerned with voice recognition input with a current reference guide to 
voice recognition research. Keywords used in locating references are provided in 
Appendix B. This guide is included in Appendix C, an annotated bibliography of 
current VR literature, with subappendices that contain references to the annotated 
bibliography by functional areas of DSS. Appendix D furnishes the publishing 
source of all literature contained in the annotated bibliography and thus facilitates 
retrieval of hard-to-find articles. 

A third objective of is to provide a current listing of all available voice 
recognition systems commercially available. This list is contained in Appendix E, 
along with information concerning compatibility with current computer systems 
for these voice systems. The voice recognition systems listed include a wide range 
of capabilities, and are useable on systems varying in size from mainframe 
computers to desk top microcomputers. 

The overall goal of this study is to supply a useful guide for decisions 
conceming the implementation or use of voice input for decision support systems as 


well as for other computer applications. 


E. SCOPE AND METHODOLOGY 
1. Scope 

This study primarily considers only current voice recognition literature, 
that is, books, articles, and reports that are less than five years old (published after 
1 January 1983). A limited amount of older literature, determined especially 
pertinent and worthy of note, also is included. 

Keywords used in searching the literature are listed in Appendix B. 
Words representing voice and speech-related topics not included in this study also 
are listed there. No experiments or case studies were conducted for this thesis. 

2. Research Methodology 

Exhaustive research was conducted to identify all current and accessible 
voice recognition literature and voice recognition systems. This research was 
conducted using Naval Postgraduate School and University of California, Santa 
Cruz, resources and via locally accessible computer networks. 

The universe of papers from which the database was drawn consists of all 
literature that contains keywords listed in Appendix B. Initially over 1000 
references were located. These items were reviewed and filtered to determine 
those applicable to DSSs. As a result of a review process, over 230 articles were 
classified as applicable to DSSs and are included in the final database in the form of 
an annotated bibliography. In many cases this bibliography also contains excerpts, 
abstracts, or summaries of those articles related to voice recognition that are 
considered to be useful for users, developers, researchers, and others concerned 


with voice input to decision support systems. 


II. DATA ANALYSIS 


A. BACKGROUND 

As fifth generation computer technology approaches, the use of “intelligent 
systems” will give increasing flexibility to the input devices of the future. The data 
collected for this study provides knowledge needed to pick the best method of 
human-computer interaction for the specific environments of a given DSS. 

It has been proposed that speech is the human's highest capacity and most 
natural form of communications [Lombardo 84]. Therefore computer voice 
recognition would be the most natural way for humans to interface with machines. 
The problem preventing the widespread acceptance of WR seems to be that most 
people are simply not aware that VR exists or what it can really do for them. 

This chapter discusses various research areas or categories of both voice 
recognition systems and DSSs. Data are placed into several categories in order to 
facilitate locating answers to specific problems and to aid in performing research 
related to a specific DSS application or environment. These categories were 
arrived at through an empirical process of reviewing the reports and noting logical 
trends in the literature. Each research area is related to an Appendix in this report 


containing references to articles germane to that area. 


B. HUMAN FACTORS 
Categories of human factors included in this study are (1) stress, (2) 
multimodality, (3) user speaking experience level, (4) computer experience level, 


and (5) the size of the vocabulary. These topics are related to several human 


factors areas: occupational, operational, psychological, physiological, and 
personal. [Yellen 83] 

Human factors is discussed first because of its importance. No matter how fast 
the computer is, how efficient its speech recognition algorithm is, or how pretty its 
displays are, it will not be used effectively or efficiently unless human factors 
knowledge applicable to system implementation has been reviewed and 
incorporated. 

Appendix Cl, Section 1, contains a listing of material applicable to the area of 
human factors. Sections 2 thru 6 of that Appendix include references that are 
specific for each category within the scope of human factors. 

1. Stress Related Factors 

Stress influences the sound wave frequency of an individual's speaking 
voice. Additionally, stressed speakers often appear to talk in longer bursts, with 
shorter pauses separating the bursts. Psychological stress also influences an 
individual's vocal production in other ways. However, there 1s no consensus in the 
literature concerning how stress can be analyzed to predict an outcome. 

Stress may be either physiological, psychological, or a combination of 
both. Physiological stress is more clear cut than psychological, and refers to the 
result of human stresses such as heat, pressure, electric shock, and similar stimuli. 
Psychological stress comes from many sources and relates to an individual's ability 
to cope, adapt, or react to an unfamiliar, unfriendly, or threatening environment, 
or to the influence of that environment on the individual. 

Psychological stress can be further subdivided into situational and self- 
induced stress. Situational stress is the influence of unfavorable environmental 


factors (excluding physical factors) on an individual. These factors are beyond the 


individual's contro! and may include circumstances such as public speaking, 
deadlines, quotas, etc. Self-induced stress is the self-imposition of a condition or 
stimulus. These include self-imposed goals, deadlines, or performance 
requirements of any type with which an individual forces himself to function above 
a "comfortable" or "easy" level [French 83]. It is important to remember that in 
some cases it may not be possible to separate physical from psychological stimuli. 

Research in the area of stress and voice recognition was found to be 
limited. References are listed in Section 2 of Appendix C1. 

2. Multimodal Factors 

Voice recognition systems are unique in their ability to free the user's 
mind and eyes for carrying out visual tasks. A voice recognition system permits 
the user to view graphics, screens, and decision aids, to oversee personnel, or to 
read from a data source without having to remove the eyes in order to communicate 
with the computer. 

Baker states in her keynote address to the First International Conference 


on Speech Technology: 


Just as Darwin hypothesized that people developed spoken rather than 
gestural language so as to free up their hands and be able to communicate 
in the dark or out of sight, so speech recognition has seen its initial 
applications in “hands busy, eyes busy” applications. [Baker 84] 


Voice recognition systems promise freedom from the distraction of 
interrupting the flow of work to recall codes and find keys. Voice recognition can 
free the operator from having to remain close to a specific physical installation, 


such as a video display terminal or keyboard. Additionally, the use of a wireless 
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microphone permits extensive mobility while talking to computers. French states 
that 


Voice-input could enable the operator to continue the task at the 
terminal, and simultaneously manipulate a visual representation of the 
problem they are involved in, for others’ benefit. This is a potential boom 
in the period of transition from a symbolic gestalt to an era of much more 
wide spread computer literacy. [French 83] 


As cited by Yellen, with this increased mobility also come increased 
problems; breath noise can now create a serious problem [Yellen 83]. An 
individual who is involved in little or no physical movement while engaged in voice 
recognition can obtain very high recognition accuracy, but errors may be induced 
once the user begins to move. When using a close-talking, noise-concealing 
microphone, inhaling does not appear to cause problems; however, exhaling will 
produce signal levels comparable to speech levels. 

The advantage of having ones hands, eyes, and mind free to perform other 
tasks could be the major contributing factor in the choice of voice recognition input 
to a computer application. This multimodal aspect of voice recognition enhances or 
compliments traditional tactile input methods rather than replacing them in total. A 
listing of literature related to the multimodal aspects of voice recognition is 
contained in Section 3 of Appendix Cl. 

3. Speaker's Experience Level 

Many studies have been done measuring the speaker's experience with 
voice recognition systems and the resulting quality of the output or task 
performance. The research in this area is referenced in Section 4 of Appendix C2. 

Most studies generally agree that, regardless of the initial experience level 


of a speaker, novices quickly pick up voice recognition systems skills and that their 


al 


performance improves rapidly toward levels of experienced users. It is important 
to note that professional typing skills require a long learning period and diminish 
quickly with disuse. On the other hand, speaking is a natural output mode for the 
human and is practiced everyday by all. The user has only to restrict spoken 
utterances to those which the machines can recognize. 

4. Computer Experience Level 

It is a credit to the adaptability of humans that they can use today’s 
software when so much of it sull abounds with such non-memorable commands. 
Complex multiple command/control/shift keystrokes often are required which can 
only be recalled by constant and experienced users. Commands that require precise 
syntax, spacing, and order can be simplified by the use of voice commands. Once 
the utterance is recognized by the computer it 1s input correctly. Long commands 
or passwords which require accurate input and multiple keystrokes are easily 
mistyped, but can be input accurately with a voice recognition system. 

The video display can provide directions for the next voice input through 
the use of menus or with a graphical representation. This enti be of special value to 
both DSSs and Group DSSs, enabling rapid generation of “what if" brain storming 
or alternatives generation. 

Section 5 of Appendix C1 provides a guide to publications that deal with a 
users Computer experience level. Many techniques are listed in these articles 
which enable better performance, given a specific experience level. 

5. Vocabulary Factors 

The vocabulary selected for a voice recognition system affects the speed 

and accuracy of the system in many ways. The selection and structure of the 


vocabulary is extremely important to the success of the system. The vocabulary 


ligz: 


should be as natural as possible, while avoiding conflicting, confusing, or similar 
sounding utterances. 

Most current voice recognition systems perform well with small 
vocabularies. When the size of these vocabularies gets large (greater than 1000 
utterances) the probability of error increases, along with the processing time. The 
possibility of confusion between words increases with vocabulary size also, as does 
the probability that similar sounding words have been included. Better speech 
recognition systems usually have recognition algorithms designed to reject rather 
than guess at questionable or similar words. 

Humans have a low tolerance level for waiting for machines and for 
machines that make errors; studies show that humans tend to abandon systems that 
perform in this manner. With very large vocabulary sets, the amount of data to be 
processed for each recognition is intolerably large unless coding is optimal and 
optimized comparisons are used. Accuracy is increased and recognition time 
decreased by using vocabulary subsets. A given subset usually is entered by saying 
the subset's name or title (also called the node word). Once in this subset or node, 
the system will search and recognize only the words included in this subset. This 
increases both speed and accuracy, and allows for different output for a given 
input. 

For example, a subset of numbers may be entered with the node word 
“number’. Only words representing those numbers contained within the node will 
be recognized (along with node words which exit the subset). This allows the use of 
homonyms (such as "two" and "to") without confusion. When in the subset of 


“numbers”, the utterance "to" or "two" will produce an output of "2". When in 
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other systems the utterance "to" or "two" will produce the output string of "to" (or 
any other preprogrammed output desired). 

The selected vocabulary can also be used to overcome problems related to 
cumbersome program commands or other often-forgotten commands through 
allowing for various input utterances to result in the same output string. For 
example, each computer network has a specific command to log off or check out of 
the system. These usually differ from system to system, and it may be difficult to 
remember which is required for each system. Programming three or four 
different utterances that produce the same correct output command will alleviate 
this problem (e.g., “log out", “log off", “check out", and "bye bye" might all 
correspond to the output string "LOGOFF “M”; saying any of them produces the 
desired result). 

Literature related to the area of speech recognition system vocabularies is 


referenced in Section 6 of Appendix Cl. 


C. ENVIRONMENTAL FACTORS 

The environment in which a system will be used can play a decisive role in the 
choice of the input device and the voice recognition system to be used. In a United 
Nations command center that is dark, noisy, and filled with people from many 
nations with varied languages and customs, typing commands to a computer in one 
language in a fixed syntax is not practical. A well-implemented voice recognition 
system can do this job faster and without the mistakes normally associated with 
human translators. This "Tower of Babel" in which one can communicate as 1f with 


one tongue can be implemented with current technology through proper design. 
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References to environment-related studies and research are found in Section 1 
of Appendix C2. Subsets of these references, related to specific environmental 
factors, are provided in Sections 2 through 6 of Appendix C2. 

1. Multilingual Factors 

The UN example may be the extreme, but in this world of instant world- 
wide telecommunications, international businesses, and melting pot nations, 
computers frequently must interface with people who speak different languages. 
Voice recognition systems are unconcerned with what language is spoken. They 
operate by matching the pattern of a given voice input (utterance) with a known 
pattern and then outputting some predesignated command string, therefore acting 
somewhat like a translator. For example, three languages may be spoken in an 
office (English, Spanish, Hindi). The computer software requires input in English. 
It is impractical to teach all the personnel both English and the commands required 
to operate the computer. A voice recognition system could be installed that 
“understands” utterances in all three languages and outputs the English commands 
that the software requires. 

Research and other literature related to voice recognition with 
multilingual environments is found in Section 2 of Appendix C2. 

2. Multicultural Factors 

Multicultural factors arise when different people have different ideas, 
Styles, or ways of doing things. All computer operating systems perform similar 
functions, but there are subtle differences in the way commands are activated. For 
example, for a simple file transfer, the UNIX operating system uses a specific 


syntax that is completely different from that used by an IBM operating system. 


ie 


Switching between MS-DOS, Z-DOS, Apple DOS, and the Macintosh operating 
systems usually will require the user to look up the desired commands. 

Voice recognition systems can ease these difficulties by doing the lookup 
for the user: the same phrase, "save and quit”, can be programmed to produce the 
same result on all systems. Woice recognition can also help equalize the varied 
experience, training, and typing skills of workers or executives exposed to new 
systems or new Situations. 

Literature sources related to multicultural factors are referenced in 
Section 3 of Appendix C2. 

3. Command and Control Environments 

Military establishments have done much work toward application of voice 
recognition systems in the command and control environment. The result of this 
work has been the acceptance and implementation of operational voice recognition 
systems in both strategic and tactical command and control environments. Most of 
this research can also benefit civilian business and industry applications. A listing 
of current research relating the areas of voice recognition systems and command 
and control is provided in Section 4 of Appendix C2. 

4. High Noise Environments 

Voice recognition systems have been used effectively in quiet office 
environments and also in noisy industrial assembly areas (noise levels in excess of 
100 db). Although voice recognition equipment manufacturers have endeavored to 
make their equipment work equally well in both environments, there are some 
locations where it is still too noisy for voice recognition systems to operate unaided. 
In such environments the use of a soundproof booth or a mask (such as a noise- 
reducing stenographer's mask) can help; external noise is diminished and effective 


voice recognition can take place. 
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Most researchers agree that, when using speaker dependent systems, 
"training" voice samples should be collected in the environment in which they will 
be used. This is especially true with noisy environments. 

Another method to improve voice recognition in a noisy environment is 
to use a speech enhancement algorithm. This is a software technique used to clean 
up the speech pattern before it enters the recognition device. A noise concealing 
microphone (like those that have been used in aircraft for years) also can be used. 
This microphone samples the environmental background noise and aids in 
canceling out this background noise prior to its being sent to the recognizer. 

When noise is a consideration in the environment, a close look at research 
in this area is critical. Even for quiet office environments, an understanding of 
noise as it relates to voice recognition is recommended. Most mechanical things 
make noise, some at frequencies that the human cannot hear or chooses to ignore 
due to familiarity. The noise of a car, airplane, copy machine, or elevator during 
training or execution of voice recognition commands can result in puzzling 
problems. Noise-related articles and research are listed in Section 5 of Appendix 
C7. 

5. Low-Light Environments 

Low-light environments include both dimly lit control rooms and 
completely darkened auditoriums. In these environments, lighting can interfere 
with the performance of the operators’ primary mission. The cockpit of an aircraft 
and the bridge of a ship are specific environments where good night vision is 


paramount. During daylight, normal manual input devices are adequate. At night, 
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a light source can have life-threatening consequences. A voice recognition system 
allows for sightless input of computer commands plus mobility. 

Voice recognition systems can be used to control the lights ina room. A 
more complex use would involve a microprocessor voice recognition system in a 
welders helmet that controls the welding unit, turning it on and off and also 
controlling the voltages or gas flow remotely. 

References relating voice recognition systems to low-light environments 


are listed in Section 6 of Appendix C2. 


D. SITUATIONAL FACTORS 
Situational factors covered in this study include (1) system use by a group, (2) 
use by an individual, and (3) use by handicapped persons. Appendix C3, Section 1, 
provides a complete list of voice recognition systems references related to such 
situational factors. 
1. Multiuser or Group Usage 
A multiuser system is a single system that is used by many people but only 
one at a time. Group usage is the use of a system by many people during the same 
time period. Both multiuser and group usage have similar problems and 
characteristics and have thus been grouped together in this study. 
Multiuser-oriented systems can be either speaker dependent or 
independent. They can use either continuous or discrete speech recognition 


algorithms. These terms are defined as follows. 


¢ Speaker Dependent Systems: require adaptation (or "training") of the voice 
recognition system to the speech characteristics of each user in order to 
achieve recognition. 


¢ Speaker Independent Systems: recognize speech regardless of the speaker, 
and without system training in recognition of individual speech 
characteristics of users. 
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¢ Continuous Speech Recognition: the process of extracting information from 
strings of words even though the words run together as in natural speech. 
[Yeller 83] 


¢ Discrete (Isolated) Speech Recognition: the process of transforming discrete 
utterances (those with a significant pause between utterances) into computer- 
recognized speech or text. 


Although speaker independent, continuous systems are better suited and 
require less training for multiple users, other combinations should not be ruled out, 
as they offer some advantages in specific circumstances. If the group situation also 
involves environmental factors (such as in a multilingual, high noise command 
post), the difficulty of selecting a system 1s compounded. Speed or vocabulary size 
or robustness may dictate that a speaker dependent, discrete speech system be used, 
even though system training time is higher and sampling is required. 

Implementing voice recognition input to a Group Decision Support 
System (GDSS) is difficult since there are four basic GDSS typologies, each 
presenting its own unique problems. Figure 2.1 shows these four typologies. [Bui 
87]. 

Figure 2.1 (a) shows a bilateral relationship between a single-user- 
oriented DSS and a group of users, the later being considered as a whole. The 
purpose of such a DSS is in essence the same as a single-user DSS. [BUI 87] In this 
Situation a voice recognition system that is robust enough to fit the needs of the 
group is required. If the size of the group is small and its composition constant, a 
discrete, speaker dependent system (requiring system training by the users) is 
practical. Otherwise, a speaker independent, continuous speech system would be 
most appropriate. With a varying group, the cost and time required to sample and 


train each user and the constraints on vocabulary size could be prohibitive. Figure 


Lis, 


2.1 (b) extends the previous typology to include a GDSS, and has the same 


associated problems. 


Single User 
DSS 


(a) 





Figure 2.1. Typologies of Group Decision Support Systems [Bui 87] 


Figures 2.1 (c) and (d) illustrate a multilateral relationship between a 
member of a group (via a network of individual DSSs) and a GDSS. This typology 
allows the customization of individual DSSs to suit the needs of users. Currently 


the cost of a GDSS of this nature is too great for most user organizations; 
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centralized or off-site facilities (leased from or provided by a vendor), used by 
many diverse groups, are the norm. Requirements for minimal training time and 
the variability of users usually necessitate the use of a robust, speaker independent, 
continuous speech system. 

There is no perfect solution to all situations. Each installation should be 
evaluated on its own merit by well-informed analysts. Section 2 of Appendix C3 
provides references to research in this area. 

2. Individual Usage 

Voice recognition for individual usage offers the greatest possible 
number of options. Many factors can be considered when optimizing the system, 
which can be speaker dependent or independent, and use continuous or discrete 
recognizers. 

Voice recognition systems can also be used to augment other input 
devices. They can be used simultaneously with keyboards and pointing devices. In 
the fields of desktop publishing, graphics manipulation, or computer-aided design, 
the task of entering text is secondary to the drawing of shapes or manipulation of 
objects on a screen. A voice recognition system or a ‘talkwriter" can be used to 
perform a text entry task and thus not break the flow of carrying out the primary 
task. 

The most important constraint when designing a system is the time and 
effort required for training. References relating voice recognition systems to 
individual users are provided in Section 3 of Appendix C3. 

3. Handicap Situations 
A physical handicap does not impair a person's mental ability or ability to 


produce. Just as a person with an amputated leg is given a prosthetic device to allow 
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mobility, a voice recognition system can be used as a prosthesis that can compensate 
for some physical handicaps. Much work has been done in this area to bring 
independence, mobility, and productivity to the handicapped. Voice recognition 
systems not only can be used by the handicapped to operate computers, but they also 
can be used to control or manipulate other mechanical devices. 

Wheelchairs, prosthetic devices, communication devices, environmental 
controls, and many other systems may be controlled via the voice. The highly 
individual nature of designing a voice recognition system for the handicapped can 
result in the use of small, lightweight, power efficient, portable units, fine-tuned 
for the user and his or her needs. 

Research related to the handicapped and voice recognition is located in 
Section 4 of Appendix C3. Much of this research is equally applicable for use with 
non-handicapped individuals. 


E. QUANTITATIVE FACTORS 

Some of the benefits or advantages of computer voice recognition systems are 
subjective (user convenience or preference). Other aspects are undeniably 
quantitative. These include response and task time, accuracy, speed of entry, ease 
of use, and user productivity. References that evaluate or discuss these quantitative 
measures are found in Section 1 of Appendix C4. 

1. Time 

Time savings can be measured in many ways. Baker cites data from 

experiments that show communications via typewriter or hand-wniting cannot even 
approach speech, in terms of time or task efficiency [Baker 84]. Time saving, in 
terms of hours required to train the user on the system or in actual hours saved by 


the use of voice recognition, are significant, especially in common environments. 
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As voice recognition systems become commonplace and familiar, the time saved in 
training personnel is expected to increase. 

References in the area of response and task time, related to voice 
recognition systems, are included in Section 2 of Appendix C4. 

2. Accuracy 

One of the selling points of voice recognition systems is the accuracy of 
task performance. Once an utterance is correctly "understood", the system will 
produce a precise and correct output. However, two types of errors may occur: 
rejection and misrecognition. Rejection is the inability of a recognizer to classify a 
utterance correctly. Misrecognition happens when a recognizer classifies an 
utterance as something other than what was spoken. Since misrecognition is 
potentially more serious, most good recognizers are designed to reject rather than 
guess at marginal pattern matches. 

Experiments have shown accuracy rates ranging from a high of 99.8 
percent to lows in the range of 88.6 percent. The accuracy required of a system 
depends on the criticality of its application and the consequences of errors in the 
entered data. 

Research has shown that 183 percent more errors occur during manual 
data manipulation (typing) than when a voice recognition system is used [Yellen 
83]. Common typing errors such as the transposition of numbers or letters are 
almost eliminated with voice recognition. Correct entry of numbers is especially 
important since automated spelling and grammar checkers can catch most letter 
transpositions. 

Voice recognition accuracy can be improved in many ways, as covered in 


the Training Factors Section of this Chapter. Briefly stated, recognition accuracy 
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depends primarily on how the equipment is trained and on the experience level of 
the speaker. Computer experience, time of week, accent, vital capacity and rate of 
air flow, speaker cooperativeness, and anxiety all affect accuracy to a lesser extent. 
References providing other data concerning accuracy are included in Section 3 of 
Appendix C4. 
3. Speed of Entry 

Most researchers agree that speech input is faster than keyboard input. 
Most individuals can speak twice as fast as the average typist can type. With a 
greater number of nontypists gaining access to computers, faster input modes are 
needed. The Macintosh personal computer from Apple uses a pointing device, 
pull-down windows, and other enhancements (which augment the keyboard) to 
produce a more natural interface. Experiments evaluating the Macintosh’'s pull- 
down windows in comparison with continuous voice recognition input 
demonstrated a distinct advantage in using continuous speech over the pull-down 
window technology of the Macintosh. [Sweeney 86] 

In other research, after only three hours of training, subjects were 17 
percent faster using voice entry than typing [Yellen 83]. 

References concerning task completion speed are listed in Section 4 of 
Appendix C4. 

4. Ease of Use 

Various studies have been carried out that demonstrate that speech input is 

easy to learn and easy to use. Users also develop a preference for speech input in 


time. References to these studies are located in Section 5 of Appendix C4. 
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5. Productivity 

Computers excel in performing repetitious, time consuming, and boring 
tasks; humans do not. Thus productivity will be increased when such tasks can 
easily be turned over to a computer, especially if voice commands can be used to 
initiate the desired operations. 

One device that uses a voice system to increase productivity is the 
"talkwriter" or voice dictator. As the user speaks, words are recognized, entered 
into a file, and displayed on a screen. When more than one interpretation is 
possible, the system may provide a list of its best guesses; the user selects one. 
Better-developed models have very large vocabularies and automatic sentence 
punctuation. 

References relating voice recognition systems and productivity are listed 


in Section 6 of Appendix C4. 


F. TRAINING FACTORS 

Training of the user and the voice recognition system is one of the most 
important considerations in the effective implementation of systems. Methods of 
training depend on the type of voice system being implemented: speaker dependent 
or independent systems, and continuous or discrete speech systems. Certain 
training techniques have been developed that can improve recognition accuracy and 
reduce errors. The complete list of references to training is found in Section 1 of 
Appendix C5. 

1. Speaker Dependent Systems 

Speaker dependent systems require that samples of the potential user's 


voice be placed in computer memory. The system basically is tuned for each user's 
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voice. Usually these systems work better than a speaker independent systems 
because the dependent system contains samples of the actual user's voice. [Poock 83] 

Speaker dependent systems are well suited to situations where the same 
users perform the same job day in and day out. However, consistency is also a key 
element in successful recognition accuracy: a speaker may talk quite differently 
when training the machine than during operational use. Whenever possible 
training should be conducted in the same environment as the equipment will be 
operated in, to minimize variability that may affect recognition accuracy. Other 
factors that affect training and recognition accuracy are age, physical condition, 
fatigue, stress (emotional or physical), time of week, breath noise, microphone 
placement, familiarity, illness, peer pressure, workload, and external noise 
changes. When changes must occur, a new “training” session will usually retune 
the system and restore accuracy. 

Vocabulary size also affects recognition accuracy. As familiarity with a 
voice recognition system increases and the vocabulary is expanded, there will be 
more utterances that sound alike or similar to the recognizer: the system may Start 
to reject words as unrecognized that formally were accepted. To improve 
recognition of troublesome words, using duplicate words trained separately 
Sometimes will increase performance of that particular word. 

References to current research related to speaker dependent systems are 
listed in Section 2 of Appendix C5. 

2. Speaker Independent Systems 

A speaker independent speech system contains algorithms that can handle 

many different voices and dialects. The system is designed to recognize the voice 


of anyone who uses it, and thus is useful when many people are expected to operate 
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it daily. Unlike speaker dependent systems, speaker independent systems do not 
require samples of a given user's voice. As a result, speaker independent systems 
do not usually perform as well as speaker dependent systems that are tuned to a 
specific user's vocal characteristics. 

Vocabulary size and structure play an especially important part in voice 
recognition accuracy with speaker independent systems. As the size of the 
vocabulary increases, the possibility of confusion between words also increases 
since there is a greater chance that there will be similar sounding words. 

References related to speaker independent voice recognition systems are 
listed in Section 3 of Appendix C5. 

3. Continuous Speech Recognition 

Continuous or connected speech recognition systems can extract 
information from strings of words even though the words run together as in 
natural speech. Continuous speech is much more natural for humans to use than is 
discrete speech, which requires pauses between utterances. During the 1970s, most 
voice recognition systems used discrete speech. More recently, many accurate and 
inexpensive connected speech systems have been developed. 

Continuous speech systems can either be speaker dependent or 
independent. They usually involve larger vocabularies and require more powerful 
computers to run them. "Talkwriter" devices, discussed earlier, are connected 
speech systems with very large vocabularies 

A new approach to continuous recognition moves away from matching 
scheme algorithms to more flexible "phonetic" recognition schemes. Phonemes, 


the basic units of all speech, are the basis for phonetic recognition. This type of 
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system is trained using words incorporating all combinations of phonemes. The 
formulation of new words from these phonemes then is possible. 

References relating to continuous speech recognition systems are listed in 
Section 4 of Appendix C5. 

4. Discrete Speech Recognition 

Discrete or isolated speech recognition is the process of transforming 
discrete utterances into computer-recognized commands or text. Discrete speech 
contains a significant pause between utterances. A discrete speech recognizer must 
be able to detect a pause or low energy gap in order to function. Humans, however, 
sometimes find it difficult to speak with isolated words or broken phrases; hence 
discrete speech is not the most natural or desirable form of voice recognition. 

Until recently, almost all commercial applications of voice recognition 
technology have been discrete voice recognition systems. Discrete systems still 
offer some advantages over continuous recognition systems in the areas of speed, 
accuracy, and especially cost. An extensive listing of currently available 
commercial voice recognition systems is contained in Appendix E. Usually, unless 
a syStem is advertised as being continuous or connected, it is understood to be of the 
discrete variety. References contained in Section 5 of Appendix C5 provide 
additional information about discrete speech recognition. 

5. Recognition Accuracy 

Training plays perhaps the most significant role in recognition accuracy. 
Problems often arise as a result of changes, either with the user or within the 
environment. A computer usually is much more sensitive to these changes than is 
the human. An impartial observer trained to detect subtle changes and who 


understands the mechanics of the system may be needed for trouble shooting and 
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system repair. For speaker dependent systems, a simple retraining session may 
restore accuracy. The use of vocabulary nodes or subsets can increase both speed 
and accuracy (see the Vocabulary Factors Section). Duplicate words that result in 
the same output string may minimize rejection problems. Increasing the word 
recognition threshold may cause a higher rejection rate but can minimize 
misrecognition. 

Most systems come from the manufacturer adjusted to a optimal level; 
making changes may only decrease performance. The operations manual gives the 
best guidance to how this manipulation of the parameters of recognition can 
improve or detract from recognition. Publications listed in Section 6 of Appendix 


C5 provide additional information on recognition accuracy. 


G. HOST COMPUTER FACTORS 

Voice recognition systems have been used successfully on all types and sizes of 
computers. Appendix E lists current voice recognition systems and describes the 
host computers that each is compatible with. Voice recognition has also been used 
in aircraft and spacecraft control; telephones; robot control; in teaching people how 
to speak; and by the handicapped to control body limbs, home appliances, 
wheelchairs, and other conveyances. 

AS voice recognition systems mature they will become smaller, cheaper, have 
larger vocabularies, and be more robust. As a result of this they are expected to 
find their way into more computer applications and be involved in more aspects of 
human endeavor. Section 1 of Appendix C6 provides a complete list of references 


concerning host computer applications for voice recognition. 
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1. Microcomputers 

Voice recognition systems can provide input to microcomputers via many 
different configurations, both internal or external. External “voice boxes” are 
perhaps the easiest to install and maintain. They are self- contained units that may 
have an interchangeable storage medium device that allows for swapping or 
installing vocabularies or software. These storage devices can take the form of 
floppy disks, tape cartridges, integrated circuit chip cartridges, compact optical 
disks, and other types of magnetc and optical storage devices. 

A replacement keyboard is one simple and inexpensive way to install a 
voice recognition system. These systems require no additional space or alterations 
to the microcomputer, they draw their power from the normal keyboard 
connection, and have ports for the voice recognition microphone and related 
switches built into the keyboard. Much of the unique voice recognition circuitry 
that usually is installed on an internal microcomputer board is in the keyboard. The 
disk storage device of the computer is used for its vocabulary and other software. 
Programming this type of system 1s easy as it mimics the normal keyboard 
keystroke inputs. Other software is unaffected by the system and 1s unaware that 
the uSer is entering commands via voice rather than by manual keystrokes. 

Another implementation is through the use of an internal plug-in circuit 
card. This card operates in a manner similar to that of the keyboard, with the 
microphone and switches plugging into the card. These cards may incorporate 
other functions such as a modem or speech synthesis unit. 

Some voice systems are actually incorporated into the basic design of the 


microcomputer and are internal and omnipresent to its operation. Specific 
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information on these and other microcomputer voice systems are referenced in 
Section 2 of Appendix C6. 
2. Mainframes 

Mainframe computers may be accessed by the same types of methods as 
those noted for microcomputers. Links from microcomputers used either as dumb 
or intelligent terminals also may be used for access. 

Because of the powerful processors and large, fast-access storage devices 
associated with mainframe computers, much research has been done with voice 
recognition related to large computers. Research literature concerning mainframe 
computers and other large computer applications of voice recognition systems is 
listed in Section 3 of Appendix C6. 

3. Networks 

Computer networks and voice recognition systems come as a natural 
extension of microcomputer and mainframe application of voice recognition. 
Separate vocabulary nodes or specialized vocabularies may be used when accessing 
different networks. Passwords and entry procedures can be incorporated into the 
Output strings, removing much of the drudgery related to moving through a 
network. The implementation of speech recognition also allows the use of voice 
verification as an automatic entry and access device. 

Two of the largest networks used today are the telephone network and the 
automatic teller machine networks. Voice recognition systems have been proposed 
for these networks, and development efforts are underway. References related to 


voice recognition and networks are contained in Section 4 of Appendix C6. 
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4. Types of Entry Required 

Data entry requirements vary from application to application. Voice 
input can be used to collect data, as in inventory control or quality control and 
assurance situations. Voice input can be used to input data or information into a 
computer, such as in order processing, or to manipulate data, as in automatic 
message preparation. Woice can be used to convert speech to text, as in the 
"talkwriter" or automatic dictation machines. Voice can verify data that has been 
entered by others or that has been mechanically or automatically entered via some 
other input device. Voice can be used to control industrial processes, machines, 
and robots. 

Each of these applications requires a different type of system to make it 
work optimally. References related to data entry systems are provided in Section 5 


of Appendix C6. 


H. EXPERIMENTS AND RESEARCH 

A vast amount of research has been conducted in both broad and specific areas 
of voice recognition. Section 1 of Appendix C7 contains references to this 
research. This research is further divided into logical groupings, to allow focused 
study. Section 2 of this Appendix covers research in the area of artificial 
intelligence. Section 3 looks at future research, that is, those areas in which new 
trends are developing or towards which research is predicted to move. Section 4 
deals with present research, covering work done in the last five years. Section 5 
includes literature related to research conducted prior to 1 January 1983. Many 
experiments and case studies have been conducted. Section 6 is devoted to these. 

A special area of interest has evolved relating the field of voice recognition to 


the area of natural language interfaces. Dobney states that natural language 
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interfaces and speech recognition are fifth generation concepts. A natural language 
interface allows a user to express his or her request in English. Certain difficulties 
arise when using naturally spoken English. The problem is related to the use of 
homonyms, such as "I heard the song” and "I saw a herd of buffalo". A related 
difficulty results when phrases sound similar, such as "I scream" and "ice cream". 
[Dobney 87] The human mind has developed ways to sort out these problems; 
humans understand the context of what is being said, and are sensitive to shifts in 
context. Dobney presents some interface complexities which natural language 
processing must address and resolve. Some of these are listed here to demonstrate 


the scope of this problem. 


e Time flies like an arrow 
Fruit flies like a banana. 


¢ You wouldn't recognize Mary now. She's grown another foot. 
¢ Can anyone walk over Niagara Falls on a tightrope? 


¢ A sandwich is better than nothing. 
Nothing is better than a good square meal. 
Therefore a sandwich is better than a good square meal. [Dobney 87] 


The challenge will be to develop machines that will do what we mean, and not 
necessarily what we say. Literature documenting research dealing with natural 


language interfaces is found in Section 7 of Appendix C7. 
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III. RESULTS AND CONCLUSIONS 


A. RESULTS 

The primary objective of this thesis is to provide a single source of reference to 
enable the selection of an appropriate voice recognition system implementation for 
a given DSS or other computer application. Chapter II, Data Analysis, fulfills this 
objective by providing both a broad overview of voice recognition systems and 
their characteristics and a close-up view of specific categories within voice 
recognition. 

The second objective is to provide a reference guide to current voice 
recognition literature and research. Appendix C is such a guide. It contains an 
annotated bibliography and has subappendices that directly link this bibliography to 
specific areas of research that are discussed in Chapter II. An additional result of 
this study 1s Appendix D, a complete index of all publishers mentioned in the 
bibliography, which should facilitate retneval of articles that might be difficult to 
locate. 

The third objective is to provide a current listing of all commercially available 
voice recognition systems. This listing is contained in Appendix E, and gives each 
manufacturer's name, address and phone number. The various types of voice input 
devices manufactured, their intended use, and their compatibility with current 
computer systems also are provided there. 

The overall goal of this study is to provide a useful guide to help in the decision 
making process concerning the implementation or the use of voice recognition 


systems. Information in this study can be used both as an introduction to voice 
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recognition systems and as a reference source to answer questions on specific 
topics. The direct linking of specific topics to a grouping of articles dealing with 


this topic allows use of this study as a ready reference source. 


B. CONCLUSIONS 

As discussed in Chapter I, the dialog component of decision support systems 
may be the weak link when implementing a DSS. By using voice recognition 
systems to optimize this dialog component, the overall DSS will benefit. 

As noted in the Voice Recognition Systems Section of Chapter I, voice 
recognition, as well as other fifth generation concepts 1s expected to be critical for 
the future of most computer applications. 

Research listed in the Human Factors Section of Chapter II has shown that 
Stress may result from a fear of new technology. Fear of new technology is not a 
recent phenomenon. This fear of voice recognition systems often is a result of the 
user not being previously introduced to such systems. Fear also can result when the 
user 1s unaware of what voice recognition can actually do (and cannot do). 

Considering the importance of voice recognition and its proven value to human 
productivity, the volume of recent research is not increasing proportionally to its 
perceived importance. This is indicated by the amount of literature referenced 
throughout Chapter IJ. The volume of publications has not increased in recent 


years at the rate of studies done in earlier years. 
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IV. RECOMMENDATIONS 


It is recommended that designers and users of DSSs investigate voice 
recognition systems as a means of optimizing the dialog component of DSSs. As 
noted by Ralph Sprague, describing the future of Decision Support Systems, 
"Dialog will profit significantly from the inclusion of natural language processing 
techniques and voice recognition” [Sprague 87]. 

As the reality of fifth generation computer technology approaches, the use of 
“intelligent systems" such as natural language processing and voice recognition 
systems will allow for both flexible and natural input. Although no one input 
method is perfect or even appropriate for all uses, voice systems show promise for 
wider applications then presently are being implemented. 

Widespread acceptance of computer voice recognition can be encouraged by 
proper training and orientation of potential users of such systems. A good training 
and education program in the use and benefits of voice recognition will help 
smooth the path for voice recognition implementation. 

More research is needed in all areas of voice recognition. Only through 
continued research and experimentation can voice recognition systems develop and 
improve. The perceived recent lull in voice recognition research may 1n part be 
due to normal delays in the publishing process or to recent cutbacks of research 
funds. However, since the demand for better input methods continues, research 
must also continue. 

It is hoped that this study can help guide and inspire the use of voice 
recognition systems for decision support systems and other computer 


implementations. A tool has been provided that can enable quick reference to 
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literature related to specific areas of concern and research within the domain of 
computer voice recognition. Continued education and enlightenment should result 


in progress and greater acceptance of these systems. 
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APPENDIX A GLOSSARY OF TERMS 


Group Decision Support System (GDSS): a computer-based system that aims at 


supporting collective problem solving. A collective decision-making process can 
be viewed as a problem-solving situation in which there are two or more persons, 
(1) each of whom is characterized by his or her own perceptions, attitudes, 
motivations, and personality, (2) who recognize the existence of a common 
problem, and (3) who attempt to reach a collective decision. [Bui 86] 

Decision Support System (DSS): the application of available and suitable 
computer-based technology to help improve the effectiveness of managed decision 
making in semi-structured tasks. [Keen 78] 

Voice Recognition (VR): the ability of a computer or device to recognize 
spoken words correctly and translate those sounds into a predetermined output 
String to a computer; also referred to as automatic speech recognition (ASR) 
[LeFever 87] 

Continuous Speech Recognition: the process of extracting information from 
Strings of words even though the words run together as in natural speech. [Yeller 
83] 

Discrete (Isolated) Speech Recognition: the process of transforming discrete 
utterances (those with a significant pause between utterances) into computer- 
recognized speech or text. 

Utterance (Word): may be a single mono- or polysyllabic word (e.g., select) or 
a combination of mono- or polysyllabic words joined into a phrase (e.g., select-the- 


first-choice). 


38 


Rejection: the inability of a recognizer to classify an utterance correctly. 


[Yellen 83] 


Misrecognition: classification by a recognizer of an utterance as something 


other than what was spoken. 


Speaker Dependent Systems: require adaptation (or “training") of the voice 


recognition system to the speech characteristics of each user in order to achieve 


recognition. 


Speaker Independent Systems: recognize speech regardless of the speaker, and 


without system training in recognition of individual speech characteristics of users. 
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(1) All LCP-based distortion measures performed reasonably well. The 
LLR and WSM distortion measures gave the highest recognition accuracy, 
while the IS distortion measure gave the lowest score; (2) Whereas the 
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performance, the use of gain and absolute loudness degraded the 
performance; (3) Bark-scale frequency warping did not, at least for the 
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System, by G. K. Poock and E. F. Roland, March 1983. 
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[Poock 83-4] National Technology Information Service NPS-55-83-017PR, 
Final Summary: Voice Recognition/Input Issues for TACFIRE, by G. K. 
Poock and E. F. Roland, March 1983. 


[Poock 83-5] National Technology Information Service AD A127223 NPS- 
55-83-003, The Effect of Feedback to Users of Voice Recognition 
Equipment, by G. K. Poock and B. J. Martin, February 1983. 


[Poock 83-6] National Technology Information Service AD A129975, NPS- 
55-83-001, Voice Recognition Vocabulary Lists for the Army's TACFIRE 
System, by G. K. Poock and E. F. Roland, January 1983. 


[Poock 83-7] Poock, G. K., "Speech Recognition Research, Applications 
and International Efforts", Human Factors Society, Spring 1983. 


Discusses a broad overview of the speech I/O industry on a national and 
international level. Within this context, technical and human factors issues 
which are relevant in all countries are discussed. 


[Poock 84] National Technology Information Service AD A142554, 
NPS55-84-002, Effects of Emotional and Perceptual Motor Stress ona 
Voice Recognition System's Accuracy: An Applied Investigation, by G. K. 
Poock and B. J. Martin, February 1984. . 


[Poock 85] National Technology Information Service, AD A158001, 
NPS55-85-012, An Examination of Some Error Correcting Techniques for 
Continuous Speech Recognition Technology, by G. K. Poock and B. J. 
Martin, June 1985. 


[Poock 86] Poock, G. K., "A Longitudinal Study of Five Year Old Speech 
Reference Patterns", Journal of the American Voice I/O Society, v. 3, pp. 
13-18, June 1986. 


[Prasad 87] Prasad, K., and Lamba, T., "Natural Language Interfaces 
Based on Keyboard Extraction Using AWK", Microprocessors & 
Microsystems, pp. 157-160, 1 April 1987. 


[Pursley 85] Pursley, Roy, "Speech Technology--No Longer Small Talk 
for Financial Software Users", Journal of Financial Software, v. 2, pp. 52- 
53, March/April 1985. 


Points out that speech technology as a means of interfacing with a computer 
is particularly well-suited to use in the financial world. 
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[Quarmby 86] Quarmby, D., Silicon Devices for Speech Recognition, pp. 
200-215, McGraw-Hill Inc., 1986. ISBN 0-07-007913-7. 


[Reardon 87] Reardon, Tracey A., "Talk About Productivity", Words, 
v. 15, pp. 22-23, December/January 1987. 


Discusses PC-based voice recognition and voice response technology and 
how it enhances the way users do business. 


[Rehsoft 84] Rehsoft, C., "Voice Recognition at the Ford Warehouse in 
Cologne", Proceedings of the Ist International Conference of Speech 
Technology, p. 103, October 1984. 


Voice recognition has proved to be effective with an online shipping system 
at the Ford parts distribution center in Cologne. As one of the very few 
applications of this technology in Europe this center employs eight parallel 
workstations using voice recognition. This paper describes the system, 
especially the hardware and software used, and deals with ergonomic aspects 
to be observed when introducing voice recognition to the factory floor. The 
emphasis of this description is on the results of the system obtained at Ford 
and the consequences drawn from them for the introduction of voice 
recognition in general. 


{Reuhkala 83] Reuhkala, E., “Recognition of Strings of Discrete Symbols 
With Special Application to Isolated Word Recognition", Acta Polytechnica 
Scandinavica, pp. 1-92, 1983. 


[Rigoll 84] Rigoll, G., "Experiences in Interfacing Voice-Input/Output 
Devices to Host Computers, NC-Machines and Robots", Proceedings of the 
Ist International Conference of Speech Technology, p. 93, October 1984. 


The Fraunhofer-Institut fiir Arbeitswirtschaft und Organisation (IAQ) in 
Stuttgart performs contract research for industry and government. Several 
projects were carried out, concerning the integration of voice-input/output 
equipment into office automation and production systems, using various 
voice-input/output device and chip-sets. Among these projects was the use 
of a voice-input device and a voice output board for NC-machine 
programming, the integration of voice-input technology in quality control. 
The experiences concerning the industrial application of voice-input/output 
technology and the difficulties in interfacing the devices are presented in this 
paper. 
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[Rigsby 82] Rigsby, Mike, Verbal Control With Microcomputers, p. 312, 
Tab Books, 1982. 


Provides an overview of speech and the problem it presents for machine 
recognition and a "hands-on" guide for operating a microcomputer that 
recognizes and responds to voice commands. 


[Roberts 86] Roberts, L., and others, "Improving Speaker Consistency in 
an Automatic Speech Recognition Framework", Computer Speech and 
Language, pp. 61-N93, March 1986. 


[Rollins 83] Rollins, A., Constantine, B., and Baker, S., "Speech 
Recognition at Two Field Sites, Chi '83", Human Factors in Computing 
Systems, pp. 267-273, 1983. ISBN 0-89791-121-0. 


[Rollins 85] Rollins, A. M., "Speech Recognition and Manner of Speaking 
in Noise and in Quiet”, Human Factors in Computing Systems, pp. 197-199, 
14-18 April 1985. ISBN 0-89791-149-0. 


[Ross 84] Ross, Steve, and MacAllister, Jeff, “Practical and Continuous 
Speech Recognition", Computer Design, v. 23, pp. 69+, 15 June 1984. 


Presents a continuous speech recognition system that accepts sentences of 
any length, and permits cost-effective voice-data entry in demanding real- 
world environments. 


[Rossi 83] Rossi, M., Nishinuma, Y., and Mercier, G., "Multi Speaker", 
(FRENCH), Speech Communication, v. 2, n. 2-3, pp. 215-217, July 1983. 


We present an algorithm for the recognition of vowels using acoustic cues 
other than formant values. The acoustic cues presented make use of 
information relative to the spectral or temporal distribution of energy. 
These cues are context-independent and we obtained a mean rate of 
recognition of 92% for several speakers. The most efficient cues were those 
of the features open/close and front/back; the cues of nasality, on the other 
hand, showed greater intersubject variability and defined distinct classes of 
speakers. The context independency of the cues with isolated words leads us 
to expect good results for continuous speech. 


(Saitta 83] Saitta, L., "Experiments in Evidence Composition in a Speech 


Understanding System", /nternational Journal of Man-Machine Studies, 
v.19, pp. 19-31, July 1983. 
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A method for composing partial evidences in pattern recognition problems 
is presented and experimental results, referring to speech understanding, are 
also discussed. 


The method is well suited for real-time problems, where speed and 
parallelism in taking decisions are fundamental requirements. The case 
study presented in the paper is a simple one, for the sake of clarity, but a 
generalization to complex production systems can be easily obtained. 


[Salfer 85] Salfer, D.L., Voice Automation Of Ship Control, Master's 
Thesis, Naval Postgraduate School, Monterey, California, September 1985. 


This thesis explores possible shipboard application of speech recognition 
technology. It includes a detailed analysis of tasks performed on the bridge, 
in the Combat Information Center and in the main engineering control space 
of an FFG-7 Frigate. 


{Santarelli 84] Santarelli, Mary-Beth, "Voice Recognition: Not Just a Lot of 
Talk", Software News, v. 4, pp. 44-45, December 1984. 


Explains that while voice recognition has been successfully used in factories 
for quality assurance and inventory applications, it may not be sophisticated 
enough to be used in the office environment. 


[Scaglhiola 83-1] Scagliola, C., “Continuous Speech Recognition Without 
Segmentation: Two Ways of Using Diphones as Basic Speech Units", Speech 
Communication, v. 2, n. 2-3, pp. 199-201, July 1983. 


[Scaghola 83-2] Scagliola, C., “Language Models and Search algorithms for 
Real-Time Speech Recognition", /nternational Journal of Man-Machine 
Studies, v.22, pp. 523-547, 1983. 


In this paper, the “continuous speech recognition" problem is given a clear 
mathematical formulation as the search for that sequence of basic speech 
units that best fits the input acoustic pattern. For this purpose spoken 
language models in the form of hierarchical transition networks are 
introduced, where lower level subnetworks describe the basic units as 
possible sequences of spectral states. The units adopted in this paper are 
either whole words or smaller subword elements, called diphones. The 
recognition problem thus becomes that of finding the best path through the 
network, a task carried out by the linguistic decoder. By using this 
approach, knowledge sources at different levels are strongly integrated. In 
this way, early decision making based on partial information (in particular 
any segmentation operation or the speech/silence distinction) is avoided: 
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usually this is a significant source or errors. Instead, decisions are deferred 
to the linguistic decoder, which possesses all the necessary pieces of 
information. 


The properties that a linguistic decoder must posses in order to operate in 
real-time are listed, and then a best-few algonthm with partial traceback of 
explored paths, satisfying the above requisites, is described. In particular, 
the amount of storage needed 1s almost constant for any sentence length, and 
the interpretation of early words in a sentence may be possible long before 
the speaker has finished talking. Experimental results with two systems, one 
with words and the other with diphones as basic speech units, are reported. 
Finally, relative merits of words and diphones are discussed, taking into 
account aspects such as the storage and computing time requirements, their 
relative ability to deal with phonological variations and to discriminate 
between similar words, their speaker adaptation capability, and the ease with 
which it is possible to change the vocabulary and the language dependencies. 


[Scagliola 84] Scagliola, C., and Marmi, L., "A Continuous Speech 
Recognition Based on a Diphone Spotting Approach", Cybernetic Systems: 
Recognition, Learning, Self-Organization, pp. 73-83, Research Studies 
Press, Ltd., 1984. ISBN 0-471-902195. 


{Schalk 82] Schalk, Thomas B., Fantz, Gene A., and Woodson, Larry, 
“Voice Synthesis and Recognition", Mini Systems, v. 15, pp. 146+, 
December 1982. 


[Schalk 83] Schalk, T. B., and Van Meir, E. L., “Terminals, Listen Up, 
Speech Recognition is a Reality", Computer Decisions, pp. 97-104, 
September 1983. 


[Schmandt 85] Schmandt, C., Voice Communication With Computers, pp. 
133-160, Ablex Publishing Company, 1985. ISBN 0-89381-244-1. 


[Schotola 84] Schotola, T. "On the Use of Demisyllables in Automatic Word 
Recognition", Speech Communication, v.3,n. 1, pp. 63-87, April 1984. 


This paper describes experiments on automatic speech recognition using 
demisyllables as segmentation units and the consonant clusters contained 
therein as decision units for classification. As compared to the large number 
of different demisyllables, the use of consonant clusters reduces the class 
inventory considerably. In order to test the method, three experiments 
dealing with isolated German words were carried out. In the first 
experiment the syllabic segmentation of words was investigated; in the 
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second experiment the methods for classification of consonant clusters were 
tested. In the third experiment a complete 1000-word recognition system 
was developed which performed the segmentation, the classification of 
consonant clusters and vowels, and a correction of recognition errors by use 
of a phonetic lexicon. Demisyllables segmentation and processing have 
proved suitable, especially for large vocabularies. 


[Scott 83] Scott, Brian L., "Voice Recognition Systems and Strategies”, 
Computer Designs, v. 22, pp. 67-70, January 1983. 


Describes word verification as an approach to voice recognition that 
overcomes the processing and memory-intensive demands of large system 
vocabularies. 


{Seaman 82] Seaman, John, "Voice: New Ways With an Old Medium", 
Computer Decisions, v. 14, pp. 62+, March 1982. 


Discusses applications of voice processing and describes voice processing 
equipment for data entry (recognition) and response (synthesis). 


{Seaman 83] Seaman, John, “The Latest Word in Voice Recognition", 
Computer Decision, v. 15, pp. 48+, February 1983. 


Examines the new Votan Model V5000 voice recognition and voice response 
unit. 


{Seaman 85] Seaman, J., Voice: New Ways With an Old Medium, pp. 85- 
91, Havden Book Co., 1985. ISBN 0-8104-6329-6. 


[Senensieb 84] Senensieb, G. A., "Speech Input and Output--A Survey of 
Available Products", Proceedings of the Ist International Conference of 
Speech Technology, p. 57, October 1984. 


The capabilities of current speech input and output technology are explained 
and assessed with reference to a selection of existing products. Included in 
the survey are speech recognition products, single synthesizers, and text-to- 
speech systems. The tangible benefits of applying speech technology are 
Summarized and the author's view of a challenge for the future is presented. 


[Shapiro 84] Shapiro, E., "A Business Computer, A Business Program, and 
More on Voice Recognition", Byte, pp. 147-154, February 1984. 


Recent developments raise some questions about perceived industry trends. 
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{Shapiro 85] Shapiro, S. F., "Speech Recognition Produces Natural 
Interface", Computer Design, pp. 59-62, March 1985. 


{Shore 83] Shore, J. E. Burton, "Discrete Utterance Speech Recognition 
Without Time Alignment", JEEE Transactions in Information Theory, pp. 
472-491, July 1983. 


[Silverman 85] Silverman, H. F., “One Architectural Approach for Speech 
Recognition Processors", Algorithmically Specialized Parallel Computers, 
pp. 129-148. Academic Press, Inc., 1985. ISBN 0-12-654130-2. 


[Siroux 85] Siroux, J., and Gillet, D., "A System for Man-Machine 
Communication Using Speech", Speech Communication, v. 4, pp.289- 
315, December 1985. 


KEAL is a continuous speech recognition system developed at the CNET 
laboratory in Lannion (France). Part of the laboratory's current work aims 
at extending it in the direction of a speech-understanding and man-machine 
dialog system. A question-answer-type dialog is set in motion in order to 
provide the user with information (the current application consists in 
simulating a directory inquiries service). This paper describes how 
syntactic, semantic, and pragmatic knowledge is used for implementing such 
a dialog, and the main advantages and drawbacks of the methods chosen are 
discussed. Sentence recognition is performed by a left-to-nght bottom-up 
parser by means of a semantic context-free grammar. Using a method 
analogous to that of semantic attributes, the parse-tree 1s then interpreted in 
order to obtain a semantic structure which represents the information 
relevant to the subsequent dialog. The dialog manager uses the semantic 
structure for instantiating a model graph, which represents the state the 
dialog at any instant; it indicates the next message to be sent to the user, and 
how to analyze his answer. An example derived from the directory 
inquiries service 1s described. 


[Smith 83] Smith, F. J., and Linggard, R. J., “Information Retrieval by 
Voice Input and Output", Research and Development in Information 
Retrieval, pp. 275-288, Springer-Verlag New York, Inc., 1983. ISBN 0- 
387-11978-7. 


[Smith 84} Smith, Emily T., and Harris, Marilyn A., "More Than a 
Whisper of Hope for Computers You Can Talk To", Business Week, p. 92F- 
H, 17 December 1984. 


Examines the new IBM experimental computer which has a system capable 
of recognizing 5,000 spoken words with 95% accuracy. 
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[Spine 84] Spine, T., Williges, B. H., and Maynard, J. F., “An Economical 
Approach to Modeling Speech Recognition Accuracy", I/nternational 
Journal of Man-Machine Studies , v.21, pp. 191-202, September 1984. 


Accuracy of speech recognizer decisions is an important criterion for 
maintaining both system effectiveness and user satisfaction. A central- 
composite design methodology is recommended as an economical means to 
develop empirical prediction equations for speech recognizer performance 
incorporating a number of influential factors. Factors manipulated in the 
central-composite design included number of training passes, reject 
threshold, difference score, and size of the active vocabulary. The factorial 
combination of two noncontinuous variables, sex of the speaker and inter- 
word confusability, was also investigated by replicating the central- 
composite design to create four sets of data. Standard least-squares multiple 
regression analysis was used to develop the four sets of prediction equations, 
each of which accounted for at least 50% of the variance in recognizer 
performance. A cross-validation study revealed that shrinkage was not 
excessive. Subsequently, these empirical models were incorporated into an 
interactive design tool for a dialogue author where the percentage of correct 
recognition is automatically optimized when the dialogue author enters the 
size of the vocabulary to be used or both the vocabulary size and desired 
number of training passes. The design tool can also be used to make 
predictions anywhere within the response surface. Use of these efficient 
data collection procedures along with the interactive design tool should 
greatly assist the dialogue author in predicting the impact of various 
language, task, environmental, algorithmic, human, and performance 
evaluation factors on speech recognition accuracy. 


[Stephens 83} Stephens, Ron, "Make the Way for Another Revolution", 
Modern Offices, v. 28, pp. 96+, October 1983. 


Suggests that many of the current methods of communicating and 
manipulating information which have traditionally been dependent on 
keyboard entry, may soon be replaced by voice-based procedures, causing a 
major transformation with the automated office. 


[Strat Inc 81] Voice Input/Output: Markets, Technologies & Applications , 
p. 110, Strategic Inc., 1981. 


Analyzes the advantages of voice I/O, states of the market technology trends 
in speech synthesis, future applications, voice response, text-to-voice, 
language translations, aids to handicapped and computer output. Electronic 
voice mail, dictation/word processing, computer I/O automation, games, 
etc., also are included. 
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[Sweeney 86] Sweeney, M. J., and Bitar, K. J., An Analysis of Friendly 
Input Devices for the Control of the Naval Warfare Interactive Simulation 
System, Master's Thesis, Naval Postgraduate School, Monterey, California, 
March 1986. AD $9333. 


This thesis describes an experiment conducted at the Naval Postgraduate 
School (NPS) during the period 15 October through 28 October 1985. 
Specifically, the experiment evaluates “pull-down window" micro-computer 
technology, continuous speech recognition equipment, and standard 
computer keyboard entry to input commands and control environment. 
Using the Naval Warfare Interactive Simulation System (NWISS) as a 
controlled medium, military problems were posed to test subjects in specific 
light and noise environments. Although the results are not entirely 
conclusive, they do demonstrate a distinct advantage in using continuous 
speech or keyboard entry modes over the drop-down window technology of 
the Macintosh (if subject training time is not a significant restriction). 
Either the continuous speech or the keyboard method was clearly superior in 
all environments. 


{Taggart 81] National Technical Information Service AD-A105 568, Voice 
Recognition as an Input Modality for the TACCO Preflight Data Insertion 
Task in the P-3C Aircraft, by John Laughlin Taggart and Charles Darwin 
Wolfe, Jr., p. 150, March 1981. 


Reports the results of an experiment to compare accuracy and entry speed 
capabilities of a standard keyboard with the Threshold Technology T-600 
voice recognition unit in the performance of an operational data entry task 
in the P-3C aircraft. 


[Tanaka 83] Tanaka, A., and others, "A Study of the Syllable Onented 
Recognition of Continuous Speech", Speech Communication, v. 2, n. 2-3, pp. 
207-210, July 1983. 


[Taylor 86] Taylor, M., Voice Input Applications in Aerospace, pp. 322- 
337, McGraw-Hill Inc., 1986. ISBN 0-07-007913-7. 


[Tecosky 86] Tecosky, T., Interfacing Standards for Recognizers, pp. 244- 
255, McGraw-Hill Inc., 1986. ISBN 0-07-007913-7. 


[Teja 83} Teja, E. R., and Gonnella, G., Voice Recognition Technology, 
p. 212, Reston Publishing Co., 1983. ISBN 0835984176. 
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[Thompson 84] Thompson, H., "Artificial Intelligence and Speech Processing: 
The Good News and the Bad News", Proceedings of the Ist International 
Conference of Speech Technology , p. 217, October 1984. 


Discusses author's expectations about the contributions we can and cannot 
expect from Artificial Intelligence to Speech Processing over the next few 
years. 


[Thompson 85] Thompson, Linde, "Voice Recognition Systems: A Sound 
Investment in the Future", News 34-38, pp. 59+, March 1985. 


Looks at the present and the future uses of voice recognition. 


[Tyler 86] Tyler, J., "Speech Recognition System Using Walsh Analysis 
and Dynamic Programming", Microcomputers & Microsystems, pp. 427- 
N433, October 1986. 


[Underwood 84] | Underwood, M. J., "Human Factors Aspects of Speech 
Technology", Proceedings of the Ist International Conference of Speech 
Technology , p. 223, October 1984. 


Regards speech technology as a means to an end, and not an end in itself. 
Discusses the human component in the speech technology system and its 
importance. 


[Viglione 84] Viglione, S. S., “Trends in Development of Speech 
Recognition Systems", Proceedings of the Ist International Conference of 
Speech Technology , p. 169, October 1984. 


Discusses the inherent superiority of speech over other modes of human 
communications and the growing need for better control of complex 
machines. Discusses the major role of man-machine communication 
through the use of speech recognition and speech response systems. 


[Viglione 86] Viglione, S., Recognition Past and Future, pp. 373-387, 
McGraw-Hill Inc., 1986. ISBN 0-07-007913-7. 


Discusses the inherent superiority of speech over other modes of human 
communication and the growing need for better control of complex 
machines. discusses the major role of man-machine command through the 
use of speech recognition and speech response systems. 


[Visser 87] Visser, Roger, "Voice Recognition Fills Technical Barriers", 
Manufacture Engineering, v. 98, pp. CT-24 to CT-26, May 1987. 
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Discusses voice recognition, the technology which allows people to interact 
with computers using voice instead of keyboards and terminals and which 
has been successfully implemented by numerous manufacturers from steel 
and car makers to circuit board designers. 


[Wagner 87] Wagner, M., "A Speech Recognition Experiment With the 
Entire Syllable Inventory of Standard Chinese", Speech Communication, v. 
6, pp. 363-369, 1 December 1987. 


This paper explores the possibility of using automatic speech recognition as 
a front end to a computer for Chinese character processing. A speech 
recognition experiment has been performed with the complete inventory of 
second-tone syllables of Standard Chinese. Two recordings of this 
inventory, which were made 48 hours after one another, were used as test 
and reference sets. It is shown that the distnibution of intrasyllable distances 
and the distribution of intersyllable distances overlap considerably for the 
full inventory of 260 second-tone syllables. The recognition rate was 
determined as a function of the syllable size and is 47.3% for the complete 
syllable inventory. 


[Watrous 85] Watrous, Raymond, "Speech Input/Output: Support for 
Integration, Journal of Computer-Integrated Manufacturing Management, 
v. 1, pp. 37-44, Spring 1985. 


Describes the current status of speech I/O technology and defines some of 
the terminology associated with the technology followed by a discussion of 
the technology's advantages and successful use. 


[Wetterlind 86] Wetterlind, Peter James, "A Speech Error Correction 


Algorithm for Natural Language Input Processing", Computer Science, v. 
17, p. 300, 1986, UMI order number: AD A86-25455. 


This research experiment consisted of construction of a system for 
identifying a natural language sentence using only speaker independent 
phonemes as the input. The motivating hypothesis for the experiment is that 
spoken sentences can be recognized from limited phoneme input. The 
research system accepts only strings of consonant phonemes, which are 
recognizable in a speaker independent environment. The original ‘spoken’ 
Sentence is reproduced from the consonant phonemes and formatted as a 
word sequence for subsequent transmission to a natural language processing 
system. The system uses a vocabulary of general words and an expandable 
dictionary of domain specific words during the sentence recognition 
process. 
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[White 84] White, G. M., "Speech Recognition: An Idea Whose Time is 
Coming", Byte, pp. 213-225, January 1984. 


Some theoretical and practical aspects of this emerging technology are 


presented. 

{Wilson 84] Wilson, J., "Where Do We Go from Here?", Proceedings of 
the Ist International Conference of Speech Technology , p. 181, October 
1984. 


Discusses the background and evolution of future speech technology 
products and services. 


(Williams 85] Williams, John M., "Computer Knows its Programmer's 
Voice", Government Computer News, v. 4, p. 32, 5 July 1985. 


Discusses a quadraplegic's voice recognition system which allows him to 
perform the same tasks as other computer programmers. 


[Withers 83] Withers, S. J., "Woice Control of an Interactive Simulation", 
Simulation, pp. 28-29, January 1983. 


A low cost, microcomputer-based voice recognition device makes a 
convenient input channel for an interactive model of a manufacturing 
system. The problems with current hardware are its limited capabilities and 
unreliable operation. However, the potential exists for useful voice control 
of simulations in the near future. 


{Wood &6] Wood, Lamont, "Voices in the Wilderness", Computer 
Decisions, v. 18, pp. 34+, 8 April 1986. 


States that voice recognition is a long way from becoming a widely accepted 
office technology but, nevertheless, today's voice recognition systems do 
have valuable applications, especially on the shop floor and in the 
warehouse. 


[Woods 85] Woods, Tom, "Computers Learn to Listen", Business 
Computer Systems, v. 4, pp. 80+, March 1985. 


Suggests that today"s pioneering speech recognition products provide a 


glimpse of the exciting technologies and diverse business applications soon 
to come. 


[Wyatt 85] Wyatt, Jim, and Elbon, Dave, "Computers That Listen and 
Talk", Cause/Effect, v. 8, pp. 9+, July 1985. 
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Points out that when considering voice input/output, the terms voice storage 
and playback, voice recognition, and voice synthesis can be used to 
characterize tasks being performed, and explains. 


[Yalabik 84] Yalabik, N., and Unal, F., "An Efficient Algorithm for 
Recognizing Isolated Turkish Words", New Systems and Architectures for 
Automatic Speech Recognition and Synthesis, pp. 419-426, 2-14 July 1984 . 


[Yannakoudakis 85] Yannakoudakis, E. J., "Voice I/O: Problems and 
Perspectives", Computer Bulletin, v. 1, pp.10-12, September 1985. 


Discusses one University's approach to computer voice I/O with the play- 
back or recognition of speech units through the application of rules in an 
algorithmic manner. 4 references. 


[Yellen 83] Yellen, H. W., A Preliminary Analysis of Human Factors 
Affecting the Recognition Accuracy of a Discrete Word Recognizer for C3 
System, Master's Thesis, Naval Postgraduate School, Monterey, California, 
March 1983. AD A128546. 


Literature pertaining to voice recognition abounds with information 
relevant to the assessment to transitory speech recognition devices. In the 
past, engineering requirements have dictated the path this technology 
followed. But, other factors do exist that influence recognition accuracy. 
This thesis explores the impacts of human factors on the successful 
recognition of speech, principally addressing the differences or variability 
among users. A Threshold Technology T-600 was used for a 100 utterance 
vocabulary to test 44 subjects. A statistical analysis was conducted on five 
generic categories of human factors: occupational, operational, 
psychological, physiological, and personal. How the equipment is trained 
and the experience level of the speaker were found to be key characteristics 
influencing recognition accuracy. To a lesser extent computer experience, 
time of week, accent, vital capacity and rate of air flow, speaker 
cooperativeness, and anxiety were found to affect overall error rate. 


[Zue 83] Zue, V. W., "The Use of Phonetic Rules in Automatic Speech 
Recognition", Speech Communication, v. 2, n. 2-3, pp. 181-186, July 1983. 


[Zue 84] Zue, V. W., and Huttenlocher, D. P., "Computer Recognition 
of Isolated Words from Large Vocabularies: Lexical Access Using Partial 
Phonetic Information", /nstitute of Information Science, pp. 343-347, 1984 
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PERIPHONICS CORP. 

4000 Veterans Memonial Hwy. 

Bohemia, New York 11716 

(516)467-0500 

TeleMarketer: Voice Input (for CDC; DG; DEC; HIS; IBM; NCR; Unisys; 
Wang; PABX; ACD) 

VoicePac Announcement System: Voice Input/Output (for CDC; DG; 
DEC; HIS; IBM; NCR; Unisys; Wang; PABX; ACD) 


SCOTT INSTRUMENTS CORP. 
1111 Willow Springs Dr. 
Denton, Texas 76205 
(817)387-9514 


Coretechs VET-3 Voice Entry Terminal; Voice Input/Output (for RS- 
232C) 

Shadow/VET Voice Entry Terminal: Voice Input (for Apple) 
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SHURE BROTHERS, INC. 
222 Hartrey Ave. 

Evanston, Illinois 60202-3696 
(312)866-2200 


SM10 Headset Microphone: Voice Input (forOEM) ~~ 
VR 230 Two Way Headset: Voice Input/Output (for OEM) 


VR300 Gooseneck Microphone: Voice Input (for OEM) 


503BG Close-Talk Microphone: Voice Input (for OEM) 
512 Two Way Headset: Voice Input/Output (for OEM) 


SPEECH, LTD. 

3790 El Camino Real, Suite 213 
Palo Alto, California 94306 
(415)858-2207 


Protalker: Voice Input/Output (for IBM; OEM; Microcomputer) 


SPEECH SYSTEMS, INC. 
18356 Oxnard St. 

Tarzana, California 91356 
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DS100 Phonetic Engine: Voice Input (for RS-232C) 
PE200Phonetic Engine: Voice Input (for IBM; RS232C) 


SUDBURY SYSTEMS, INC. 
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Sudbury, Massachusetts 01776 
(617)443-8966 or 1(800)245-7817 


RTAS: Voice Input/Output 


SUNCOAST SYSTEMS, INC. 
3100 McCormick St., 

Suite 22, P.O. Box 7105 
Pensacola, Florida 32514 
(904)478-6477 or 1(800)843-9363 


Computerfone: Voice Input/Output (for OEM) 


TECMAR, INC. 
6225 Cochran Rd. 
Solon, Ohio 44139 
(216)349-1009 


Voice Recognition Board: Voice Input (for IBM PC) 


TEXAS INSTRUMENTS, INC. 
P.O. Box 655012 

Dallas, Texas 75265 
1(800)527-3500 


Speech Command System: Voice Input/Output (for IBM; TI) 


VOICE COMPUTER TECHNOLOGIES CORP. 
5730 Oakbrook Pkwy 

Norcross, Georgia 30093-1888 

(404)441-2303 


VCT Series 2000 Model 2016: Voice Input/Output (for CDC; DG; DEC; 
HIS; IBM; NCR; Unisys; Microcomputer) 


THE VOICE CONNECTION 

17835 Sky Park Circle, Suite C 

Irvine, California 92714 

(714)261-2366 

IntroVoice I: Voice Input (for Apple II, Apple Ile; RS-232C) 
IntroVoice II: Voice Input (for Apple) 

IntroVoice II: Voice Input (for IBM PC, XT, AT) 


IntroVoice V: Voice Input (for IBM; Compaq 386) 
IntroVoice VI: Voice Input (for IBM PS/2, PC, XT, AT; Compaq 386) 


PVDL (Portable Voice Data Logger): Voice Input/Output (for IBM) 
VMC 2020: Voice Input (for Apple II, He) 


VOICE INDUSTRIES CORP. (VERBEX) 
10 Madison Ave. 

Morristown, New Jersey 07960 
(201)267-7505 


Series 4000; 5000: Voice Input (for RS-232C) 


VOTAN 

4487 Technology Dr. 
Fremont, California 94538 
(415)490-7600 


Voice Management System: Voice Input/Output (for RS-232C; 
Centronics parallel) 

Votan Voice Card (Board Level): Voice Input/Output (for IBM) 
VSP 1000 (Board Level: Voice Input/Output (for IEEE-786) 
VTR 3270: Voice Input/Output (for IBM; Coax 

VTR-6050 Series II: Voice Input/Output (for RS-232C) 


VYNET CORP. 

180 Knowles Dr. 

Los Gatos, California 95030 

(408)370-0555; (408)370-9764; or 

1(800)538-7002 

V2100 Telephone Voice Response System: Voice oe (for IBM) 
V2301/V1202/V2202 Telephone Speech Digitizer & Pla 

Voice Input/Output (for IBM) 


V4000 Telephone Voice Response System: Voice Input/Output (for IBM) 


XTRA BUSINESS SYSTEMS 
2350 Qume Dr. 

San Jose, Califomia 95131 
(408)945-8950 


Voice Communications System: Voice Input/Output (for XTRA Series) 
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